INEQUALITIES RELATED TO FREE ENTROPY DERIVED FROM 
RANDOM MATRIX APPROXIMATION 
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Abstract. Biane proved the free analog of the logarithmic Sobolev inequality for probabil- 
ity measures on R by means of random matrix approximation procedure. We show that the 
same method can be applied to reprove Biane and Voiculescu's free analog of Talagrand's 
transportation cost inequality for measures on R. Furthermore, we prove the free analogs 
of the logarithmic Sobolev inequality and the transportation cost inequality for measures on 
T as well by extending the method to special unitary random matrices. 

Introduction 

Since its first systematic study done by L. Gross [12] in 1975, the logarithmic Sobolev 
inequality (LSI) has been discussed by many authors in various contexts, in particular, in 
close connection with the notions of hypercontractivity and spectral gap. An LSI can be 
understood to compare the relative Fisher information with the relative entropy. Among other 
things, we here refer to the LSI due to D. Bakry and M. Emery [1] in the general Riemannian 
manifold setting, which is of quite use for our present purpose. Another interesting inequality 
was presented by M. Talagrand [28] in 1996, called the transportation cost inequality (TCI). 
A TCI compares the (quadratic) Wasserstein distance W(fi, v) between probability measures 
fi, v (for the definition see (4.1) in §4 of this paper) with y / S(fi, z/), the square root of the 
relative entropy. Indeed, in [28] Talagrand proved the inequality W(fi, u) < S(fi, v) when 
v is the standard Gaussian measure on R n , and an exposition in the case of more general v 
can be found in [21] for example. On the other hand, in [25] F. Otto and C. Villani succeeded 
in discovering links between the LSI and the TCI in the Riemannian manifold setting. This, 
combined with [1], implies the TCI in the same situation as Bakry and Emery's LSI. See 
[20, 21, 29] for more about these classical LSI and TCI as well as related topics. 

The relative free entropy T,q(h) and the relative free Fisher information &q(h) were in- 
troduced by Ph. Biane and R. Speicher [5] for fj, £ M(R), the probability measures on R, 
relative to a real continuous function Q on R, where Q has a certain growth in the case of 
Sq(/x) and it is a C 1 function in the case of 3>q(/x). Note that Sq(/x) is regarded as the rela- 
tive version of the free entropy £(//) introduced by D. Voiculescu [30] as the classical relative 
entropy is the relative version of the Boltzmann-Gibbs entropy, while $q(/u) in the case Q = 
reduces to the free Fisher information <3?(/u) in [30]. (The "free relative entropy" £(//, v) for 
two measures was introduced in [13] from a slightly different viewpoint.) In this paper we 
introduce the relative free entropy £<g(//) and the relative free Fisher information Fq{\l) for 
H £ M(T), the probability measures on the unit circle T, as well relative to a real continuous 
function Q on T (being a C 1 function for Fq((j,)). When Q = the quantity Fq((i) becomes 
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the free Fisher information F(p) introduced by Voiculescu [33]. An important fact is that the 
relative free entropy Eq (p) is the rate function (or the so-called weighted logarithmic integral 
up to an additive constant) of a large deviation for the empirical eigenvalue distribution of 
a certain random matrix. Indeed, £q(/x) for p G A4(R) is the good rate function of large 
deviation principle for the n x n selfadjoint random matrix determined by the function Q, 
while T>q{p) for p G M(T) is that for the n x n (special) unitary random matrix associated 
with Q. The definitions of these quantities as well as related matters are collected in the first 
§1 of this paper. 

Voiculescu's inequality in [32, Proposition 7.9] is the first free probabilistic analog of the 
LSI. Extending its single variable case (see (2.4) in §2), Biane obtained in [4] the following 
free LSI: 

Sq(m)<^*qM for p G .M(R) 

if Q"(x) > p on R with a constant p > 0. To prove this, Biane applied the classical LSI on 
the Euclidean space to the related selfadjoint random matrices as mentioned above and used 
the weak convergence of their mean eigenvalue distributions. Although the differentiability 
assumption of Q is not quite explicitly written in [4], Biane's free LSI is certainly valid if Q 
is a C 1 function such that Q{x) — ^x 2 is convex on R. For the sake of completeness, in §2 
we give a proof of this general case by a usual approximation technique. 

The first main aim of this paper is to show the variant of Biane's free LSI for measures on 
T. In §3 we prove 

Z Q (p) < Y^F Q (p) for p G M(T) 

if Q is a C 1 function on T such that Q^e y ^- t ^j — ^t 2 is convex on R with a constant 

p > —1/2. The proof is based on random matrix approximation. We can apply Bakry and 
Emery's classical LSI on the special unitary group SU(n), a Riemannian manifold, to the 
related n x n special unitary random matrices and pass to the scaling limit as n goes to 
oo. Here, we need the convergence of the empirical eigenvalue distribution of the random 
matrix not only in the mean but also in the almost sure sense that is a consequence of the 
corresponding large deviation principle. Although the large deviation theorem (Theorem 1.2 
below) for "special" unitary random matrices is essentially same as that for unitary random 
matrices shown in [16], the proof is a bit more complicated so that we sketch it in Appendix 
for the convenience of the reader. We also need a few stuffs from differential geometry, in 
particular, the exact computation of the Ricci curvature tensor of SU(n) (with respect to 
the Riemannian structure associated with the usual trace on M n (C)) to check the so-called 
Bakry and Emery criterion (see §§1.7). 

In [6] Biane and Voiculescu obtained the free analog of Talagrand's TCI for compactly 
supported p G M (R) as follows: 

W(p, l0 , 2 ) < + J ^dp(x)-^, 

where 70,2 denotes the standard semicircular distribution (with radius 2). Their proof involves 
the free process and the complex Burgers' equation, and it is a realization of free probability 
parallel of not only the result itself but also the proof in [25]. The proof itself justifies the 
above inequality to be the right free analog of Talagrand's TCI. 

Our second main aim is to reprove Biane and Voiculescu's free TCI in a slightly more 
general setting by making use of random matrix approximation and furthermore to give a 
free TCI for measures on T in a similar way. This aim is our initial motivation; we first wanted 
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to find another proof to Biane and Voiculescu's TCI by use of random matrix approximation 
on the lines of so-called Voiculescu's heuristics in [30] and to justify Biane and Voiculescu's 
TCI as the right free analog from the viewpoint of random matrix theory. In §4 we prove the 
free TCI 

W(p,pq) < ^J-T,q(/jl) for compactly supported p G M(R) 

if Q is a real function on R such that Q(x) — ^x 2 is convex with a constant p > and pq is the 
equilibrium measure associated with Q (or the unique minimizer of Sq(^x) for p G M(R)). 
When Q(x) = x 2 /2 and p = 1, this becomes Biane and Voiculescu's TCI. To prove this, 
we first suppose that p is supported in [—R,R] and that Q^x) := 2 J* R log|x — y\ dp{y) is 
continuous on R. We consider two n x n selfadjoint random matrices; one is associated with 
Q, and the other is associated with and restricted on the n x n selfadjoint matrices with 
the operator norm < R. Then, these random matrices are probability measures on the space 
ofnxn selfadjoint matrices (= R n ), and the classical TCI for these measures asymptotically 
approaches, as n goes to oo, to the free TCI we want. The case of general compactly supported 
p G can be treated by an approximation technique. Furthermore, as presented in §5, 

a similar method using special unitary random matrices can work to prove the free TCI 

W ^,HQ)< \[t^2^Q&) for p G M(T) 

if Q is such a real function on T as in the free LSI, that is, Q^e^^ 1 ^ — ^t 2 is convex on 

R with p > —1/2. Here, W(p, pq) is the Wasserstein distance with respect to the geodesic 
distance (or the angular distance) on T. In the particular case where Q = and p = 0, we 
have W(p,d9/2ir) < y/2Z(ji). 

In this way, we clarify the advantage of random matrix approximation procedure in study- 
ing free probabilistic analogs of certain classical theories involving relative entropy and/or 
Fisher information. The present paper may be regarded as one more attempt subsequent to 
[2, 13] toward rigorous realizations of Voiculescu's heuristics in [30] which claims that the 
classical entropy of random matrices, if suitably arranged, asymptotically converges to the 
free entropy of the limit distribution as the matrix size goes to infinity. 

The final §6 is a collection of remarks, examples and related results; in particular, we give 
the variants of the above free LSI and TCI for measures on the half line R + . 



1. Preliminaries 

The purpose of this preliminary section is to summarize, for the convenience of the reader, 
the basic notions and the results which will be needed later. We will use them with no explicit 
explanation in the main part of this paper. 

1.1. Notations. The set of all Borel probability measures on a Polish space X is denoted 
by M.(X). The Dirac measure at a point x G X is denoted by S x as usual. For p,v G Ai(X), 
the relative entropy of p with respect to v is denoted by S(p, v), which is defined by 

when p is absolutely continuous with respect to u; otherwise S(p,u) := +oo. 

The usual trace on M n (C), the n x n complex matrices, is denoted by Tr n . The Hilbert- 
Schmidt norm on M n (C) induced from Tr n is denoted by || • \\hs, i-e-, ||-A||hs : = Tr n (A*A) 1 ^ 2 
for A G M n (C). Let M* a denote the set of all n x n self-adjoint matrices, U(n) the group 
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of all n x n unitaries, and SU(n) the special unitary group of order n, i.e., the group of all 
n x n unitaries whose determinants are equal to one. 

1.2. Free entropy and free Fisher information for measures on R. The notions of 
free entropy and free Fisher information are the free probabilistic analogs of the Boltzmann- 
Gibbs entropy and the Fisher information in classical theory. For each G M (R) , Voiculescu 
[30] introduced the free entropy of /i 



: = J I ' l°g \x ~ y\ dfJ,(x) dfi(y), 



which is the minus of the so-called logarithmic energy of \x useful in potential theory (see 
[26]). It is the "main component" of the free entropy x(^) °f I 1 introduced in [31]: 

x(/x) = S( / x) + ^ + ^log27r. (1.2) 

Assume that /j G Ai(R) has the density p = d/i/dx (with respect to the Lebesgue measure 
dx) belonging to the L 3 -space L 3 (R) := L 3 (R, dx). In [30] Voiculescu also introduced the 
free Fisher information of fi 



The Hilbert transform of p 

(Hp)(x) := lim [ ^- dt (1.3) 

£ \° J\x-t\>e X - t 

plays an important role in the study of free Fisher information. The limit in (1.3) really 
exists for a.e. x G R (as long as p G L q (R) with 1 < q < oo), and it is known that p G L q (R) 
implies Hp G L q (R) for each 1 < q < oo. See [19, Chapter VI] for the Hilbert transform on 
R. As shown in [30, Lemma 3.3] we see that 

I {{Hp){x)fp{x)dx = ^- ( p(xfdx, (1.4) 

and hence the free Fisher information has an alternative description: 

#(A*) = 4 /" {{Hp){x)fp{x)dx = A I {{Hp){x)f dn{x). 

Here, we should remark that the Hilbert transform is usually defined with an additional 
multiple constant l/ir and J" R ((Hp)(x)) 2 p(x)dx = | j- R p(x) 3 dx holds instead of (1.4) in this 
case. 

Let Q be a real- valued C 1 function on R. For each \i G M(R), Biane and Speicher [5, §6] 
introduced the relative free Fisher information §Q(fi) of /x relative to Q, and it is defined to 
be 

<M/i):=4 J^(Hp)(x)-^Q'(x) S j d/i(x) (1.5) 
when fi has the density p = d\xjdx belonging to L 3 (H); otherwise to be +oo. 
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1.3. Free entropy and free Fisher information for measures on T. For each jjl G 
M(T), the free entropy of /j, is defined in the same manner as in the real line case; that 
is, 

:= |y log|C -»/|dMC)dM»/) 

([33, §§10.7], [15]). For its justification to be a right quantity, see [33, Proposition 10.8] in 
relation to the free Fisher information as well as [15, Proposition 1.4], [16] from the microstate 
approach or large deviation principle. 

Assume that \i G M(T) has the density p = dfi/d( with respect to the Haar probability 
measure d( = d9/2-rr, £ = e^~^ d with 9 G [— tt,tt) and further that p belongs to the L 3 -space 
L 3 (T) := L 3 (T,d(). As in the real line case, the Hilbert transform of p 

(Hp) (e^ e ) := lim / ^ >- - (1.6) 

V ; V J e\oy £ <| t |<^ tan(|) 2vr v ; 

is important. The principle value limit in (1.6) exists for a.e. (as long as p G L X (T)), and it 
is known that p G L q (T) implies Hp G L q (T) as well for each 1 < q < oo. See [19, Chapter 
V] for detailed accounts on the Hilbert transform on T. Following Voiculescu [33, §§8.9] we 
call the quantity 

:= / {{Hp){Q) 2 MQ= I ((Hp)(C)) 2 p(()dC 

the free Fisher information of fj,. When [i has no such density as above, F(fj,) is defined to 
be +oo. By [33, Corollary 8.8 and Definition 8.9] the free Fisher information can be written 
as 



F(/i)= K _i+ X p(c)3dc ) 



Let Q be a real-valued C 1 function on T. As in the case of measures on R, for each 
p, G Ai(T) we define the relative free Fisher information Fq(/j,) to be 

FqM := jf ((Hp)(C) - Q'(C)) 2 ^(C) - HmOd^o) 2 (1-7) 



when fj, has the density p = dfi/d( belonging to L 3 (T); otherwise to be +oo. Here, Q' means 



the derivative of Q(e^~^ e ) in 9, i.e., Q'(e te ) = 4sQ{e'^ 6 )- Slight difference between the two 



formulas (1.5) and (1.7) is worth notice. 

1.4. Large deviations for self-adjoint random matrices. Let Q be a real-valued con- 
tinuous function on R such that 

lim \x\ exp(— eQ(x)) = for every e > 0. (1.8) 

|x|— >+oo 

The weighted energy integral associated with Q is defined by 

E Q (n) := -E(/x) + / Q(x) d/i(x) for n G 7W(R). 
JR 

According to a fundamental result in the theory of weighted potentials (see [26, 1.1.3]), there 
exists a unique fiQ G M(R) such that 

E Q {^ Q ) = mi{E Q (ii):^eM(K)}, 
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and Eq(iiq) is finite (hence so is S(/xq)). Moreover, /j,q is known to be compactly sup- 
ported. The minimizer \xq is sometimes called the equilibrium measure associated with Q. 
Set B(Q) := — Eq (/j,q) so that the function 

-S(/i)+ f Q(x)dn(x)+B(Q) for fj, £ M.(R) (1.9) 

is non-negative and is zero only when \x = fj,Q. It is well known that if Q(x) = 2x 2 /r 2 with 
r > 0, then the equilibrium measure (or the unique minimizer) [iq is the (0, r 2 / 4)- semicircular 
distribution 7o, r (with variance r 2 /4): 

d'jQ r (x) : = — ^\/r 2 - x 2 x\- r ,r]( x )dx. (1.10) 

For each n G N define X n (Q) G -M(M* a ), the n x n self-adjoint random matrix associated 
with Q, by 

dA„(Q)(A) := ^-j^exp(-nTr n (Q(,4))) dA, 



where dA means the "Lebesgue measure" on M* a = R n , i.e., 

n 

dA := Y\ d An Y\d (Re Aij) d (lm Aij) with A = [Aij], 

i=l i<j 

Q(A) is the usual functional calculus and Z n (Q) is a normalization constant. It is known 
(see [22, 17] for example) that the joint eigenvalue distribution on R n of \ n (Q) is given as 

^ / n \ n 

dA n (Q)(xi, . . . ,x n ) := p exp -n^Q(xj) JJ(xj - x^) 2 JJdxj 

with a new normalization constant Z n (Q). Moreover, the mean eigenvalue distribution on R 
of X n (Q) is defined by 



A n (<5) := [ ■ ■ [ ~(4iH h *x„)dA n (Q)(xi, . . . ,x n 

J JR» n 



In [2] Ben Arous and Guionnet showed the large deviation principle for the empirical 
eigenvalue distribution of the standard self-adjoint Gaussian random matrix (i.e., X n (Q) with 
Q(x) = x 2 /2). The following is its slight generalization given in [17, 5.4.3]: When (xi, . . . , x n ) 
is distributed according to X n (Q), the empirical eigenvalue distribution 

1 (S Xl + --- + 5 Xn ) (1.11) 



n 

.2 



satisfies the large deviation principle in the scale 1/n and the good rate function is given by 
(1.9). Furthermore, one has B(Q) = Jim — \ogZ n (Q), i.e., 



n— >oo Ti-' 



5(Q) = lim log / • • • / exp -nVQ (x») ) TT(xi - x^) 2 TT dx { . 

™ n R n V U )t<j tl (1.12) 

See [8, 9] for general theory of large deviations. Since /v.q is the unique minimizer of (1.9), 
the random measure (1.11) converges in the weak topology to hq almost surely, and hence 

X n (Q) — ► Hq weakly; (1.13) 

see [17, p. 211] and also [7]. From the viewpoint of the large deviation theory of level- 2 (see 
[8, 9]), the function (1.9) can be regarded as a kind of free analog of the relative entropy with 
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respect to its unique minimizer [xq. Thus, following Biane and Speicher [5, §6] and Biane [4, 
§3], we call the function (1.9) the relative free entropy (or modified free entropy) of fx relative 
to Q, which is denoted by Eq(/x); that is, 

E Q (/i) := -E(/i) + / Q(x) dn{x) + B(Q) for n £ M(R). (1.14) 

We do noi call this the "free relative entropy" introduced in [13], a slightly different relative 
entropy-like quantity E(/z, v) for two probability measures in the framework of free probability. 
Indeed, the free relative entropy E(/x, v) for fx, v € A4(R) is defined as 

:= JJ ^log\x - y\ d(n - v)(x)d(fx - v)(y). 

But it is known (see [13, (2.7)]) that 

£(m,mq) = ^o(a») 

if the support of /i is included in that of fiQ. 



1.5. Large deviations for restricted self-adjoint random matrices. In the course of 
finding a right free analog of relative entropy, another random matrix model associated with 
Q and R > was introduced in [13]. Here, Q is an arbitrary real- valued continuous func- 
tion whose domain includes [— R, R]. The self-adjoint random matrix X n (Q;R) £ M (M* a ) 
restricted on a compact subset {^4 G M* a : ||^4||oo < -R} is defined by 

d\ n (Q-R)(A) := —±—^xp{-nTT n (Q(A))) X{] \AU<R}(A)dA 

with a normalization constant Z n (Q; R). In the above, || • ||oo means the operator norm. The 
joint eigenvalue distribution supported in [—R,R] n of X n (Q;R) is given as 

d\ n (Q;R)(x 1 , ...,x n ) 

1 / n \ n 

: = ~ — DA ex P -n^Q(xj) JJ(xi -Xj) 2 ]Jx[-i?,R](^)^ 
Zn{Q;R) V »=i / »<j i=i 

with a new normalization constant Z n (Q;R). Its mean eigenvalue distribution \ n (Q;R) 
supported in [—R,R] is defined as in §§1.4. As in the case of X n (Q), the following large 

deviation theorem holds: The finite limit B(Q;R) := lim — ^ log Z n (Q; R) exists, and when 

(x±, . . . , x n ) is distributed according to A ra (Q; i?), the empirical eigenvalue distribution (1.11) 
satisfies the large deviation principle in the scale l/n 2 with the rate function 

-E(a*)+ / Q(x)dfi(x) + B(Q;R) ior fi £ M([-R, R]). (1.15) 

The proof of this large deviation principle is similar to [17, 5.4.3 and 5.5.1] as noticed in 
[13]. In this setting, there also exists a unique minimizer fXQ t R £ M([—R,R]) of the rate 
function (1.15), whose value at /j,q,r is zero. If R > is chosen so that /j,q in §§1.4 is 
supported in [— R, R], then hq = hq^r is seen by comparing the two rate functions, and 
hence B(Q) = B(Q;R); this assertion is essentially same as in [31, Proposition 2.4] in the 
single variable case. 
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1.6. Large deviations for special unitary random matrices. Let Q be a real-valued 
continuous function on T. Similarly to the real line case in §§1.4, the weighted energy integral 



-£(//) + / Q(C)dMC) for fj, £ M(T) 
Jt 



admits a unique minimizer jjlq G M(T) (or the equilibrium measure associated with Q). Set 
B(Q) := S(^q) — J T Q(C) dfiQ(C). It is known ([16]) that the function 

-E( M ) + jT Q(C)dMC) + £(Q) for n € .M(T) 

is the rate function of the large deviation for the empirical eigenvalue distribution of an n x n 

unitary random matrix 

d\V(Q)(U) := ^±^exp(-nTr n (Q{U)j) dU, 

where dU is the Haar probability measure on U(n), Q(U) is defined via functional calculus 
and Z^(Q) is a normalization constant. Furthermore, 

£(Q)= n lim -Ilog f exp(-n^Q(C,)J ]J C, Offt^ 

■' '''' V i 1 / l<i<j<n i=l 

where d£j = d9i/2it for ^ = e^^" 61 *. However, the above unitary random matrix X^(Q) ^ s n °t 
suitable for our present purpose as will be explained in §§1.7. Thus, we need to modify the 
above large deviation to the setup of SU(ra). 

Now, we begin with the joint eigenvalue distribution of the Haar probability measure on 
the special unitary group SU(n). Note that the n eigenvalues £i, . . . , £ n of U £ SU(n) satisfy 
Ci ■ • • Cn = 1) i-e-j Cn = (Ci ' " " Cn-i) so that the joint density must be a permutation- 
invariant distribution of (£i, . . . , Cn-i) £ T n_1 . The following explicit form of the density 
seems a folklore for specialists, and in fact, it is easily derived from the Weyl integration 
formula familiar in representation theory; see [18, p. 104] for example. 

Lemma 1.1. The joint eigenvalue distribution on T"" 1 of the Haar probability measure on 
SU(n) is 



^ rt— 1 

~ } II \d-Cj\ 2 U d Ci withCn = (Cl---Cn-l)-\ 



l<i<j<n i=l 



or 



u \ : 17 U^-e^J TT d6i 
n!(2vr) n - 1 H I I H 

l<i<j<™ »=1 

wi/i n = — (6*i H h n _i) (mod 2vr). 

Let Q be a real-valued continuous function on T. For each n £ N define X n (Q) G 
,M(SU(n)), the n x n special unitary random matrix associated with Q, by 



(1.16) 
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where dU is the Haar probability measure on SU(n) and Z° V (Q) is a normalization constant. 
By Lemma 1.1 the joint eigenvalue distribution on T n_1 of A^ U (Q) is given as 

^ u (Q)(Ci,...,Cn-i)= expf-nf^)) ]J ICi-C/fidCi 

Z n \Q) \ i=l ) l<i<j<n i=l 

With Cn = (Cl • • • Cn-l) _1 - 

The next theorem is the large deviation principle for the empirical eigenvalue distribution 
of A^ U (Q), whose proof, based on the explicit form of the density of A^ U (Q), will be sketched 
in Appendix for the convenience of the reader. 

Theorem 1.2. The finite limit B(Q) := lim — log Zf^ (Q) exists. When (Ci, . . . , Cn-i) is 

n^oo n 

distributed on T n_1 according to \f^(Q), the empirical distribution H l~^C»-i + 

TOi/i Cn = (Ci " " " Cn-i) 1 satisfies the large deviation principle in the scale l/n 2 with the rate 
function 

Y, Q {p):=-Y,{p)+ J^Q{Qdp{Q + B{Q) forp€M(T). (1.17) 

Furthermore, there exists a unique minimizer (iq G A^(T) o/ i/ie rate function so that 
^q(^q) = 0. 

As before, we call the rate function (1.17) the relative free entropy of fi with respect to Q, 
which is denoted by Sq(/x) as in (1.14). 

1.7. Ricci curvature tensor of SU(n). Let M be a smooth complete Riemannian manifold 
of dimension m, and let Ric(M) denote the Ricci curvature tensor of M. For a real-valued 
C 2 function ^ on M, the Hessian of \& is denoted by Hess^). Our arguments in §3 and §5 
will need to verify the so-called Bakry and Emery criterion with a positive constant p: 

Ric(M) + Hess(*) > pl m ; (1.18) 

see [1] and Theorem 2.1 below. 

The Ricci curvature tensor of U(n) is known to be degenerate, while that of SU(n) to be of 
positive constant (see [23], a nice reference for the topic) and a straightforward computation 
shows that the Ricci curvature tensor of SU(n) with respect to the Riemannian structure 
associated with Tr n is 

71 

Ric(SU(n)) = -/ n2 _ 1 . (1.19) 
This is the reason why we have presented Theorem 1.2 with use of SU(n) instead of U(n). 

1.8. Differentiability of trace functions. A derivative formula as well as the higher order 
differentiability for a certain kind of trace functions will be essential in proving the main 
result (Theorem 3.3) in §3. The topic seems rather familiar to specialists, however we can 
find no appropriate literature. Here, a lemma is recorded in a form tailor-made for our later 
use without full generality. 

Let f(t) be a real- valued function on an interval (a, b), and let Ai, A2, • • • be distinct points 
in (a, b). The divided differences /M for r = 0, 1, 2, . . . are recursively introduced as follows: 
/I°](Ai) :=/(Ai) and 

Ar\/x \ x \ /^^^(Ai, A 2 , . . . , A r ) - /[ r_1 ](A 2 , . . • , A r , A r+ i) 

7 J (Ai, A 2 , • • • , A r+ iJ := . 

Ai — A r+ i 
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When Aj's are not necessarily distinct, /^(Ai, A2, . . . , A r +i) can be denned by continuity as 
long as / G C r {a,b); for example, f^{X,X) = /'(A) and (A, A, A) = /" (A)/2. See [10, 
§11.2] for basic properties of divided differences. Let A G M* a , all of whose eigenvalues are in 
(a, b), and A = Yl\=i ^iPi be the spectral decomposition with distinct eigenvalues Ai, . . . , A; 
in (a, b). For each Hi, H2, . . . , H r G M^ a we define 

fW(A)o(Hi,H 2 ,...,H r ) 
1 

CTdSr U,---;V+1 = 1 

where SV is the set of all permutations on {1, . . . , r}. In particular, note ([3, V.3.3]) that if 
/ G C 1 (a, 6) and A = J7diag(Ai, • • • , X n )U* is a diagonalization, then 



~dt 



f(A + tHi) = fW(A)oHi = uflfW^Xj)] oU*HiU)u*, 

where o stands for the Schur product. The next lemma can be shown in an essentially same 
way as in the proof of [3, V.3.3]. 

Lemma 1.3. Let A, Hi, ... , H m G M* a and set G(x) := A+Y^!k=i x kHk for x = (xi, . . . , x m ) 
G R m . Let f be a real-valued C r function on (a,b) for some r G N. If the eigenvalues of 
G(x) are in (a,b) for all x in an open domain D ofH m , then the function Tr Tt (/(G(x))) is 
C r on D and 

8x k dx k ■■■dx k ^(HGix))) = Tr n f/M(G(x)) o (H kl ,H k „. ..,H kr 



for all 1 < ki, k 2 , ■ ■ ■ , k r < m and x G D. In particular, 

d 



dxi 



= Tr n (((f)M(G(x)) o (H kl ,. . ^H^H^ 
D. In particular, 
Tr n (/(G(x))) = Tr n (f'(G(x))H k ) 



for all 1 < k < m and x G D. 



2. Free LSI for measures on R 

In this section we will give a supplementary comment to Biane's work [4] on free version 
of logarithmic Sobolev inequality (LSI for short) for measures on R. LSI's were first inter- 
ested in constructive quantum field theory, and it was Gross [12] who first presented in full 
generality an LSI for Gaussian measures. Among huge contributions to the topic, Bakry 
and Emery [1] gave a simple "local" criterion, the so-called Bakry and Emery criterion (see 
(1.18)), for a given measure to satisfy an LSI. Let M be an m-dimensional smooth complete 
Riemannian manifold with the volume measure dx. The precise statement that Bakry and 
Emery established is as follows: 

Theorem 2.1. (Bakry and Emery [1]) Let $eC 2 (M), and set dv(x) := ^e~^^dx with a 
normalization constant Z . Assume that the Bakry and Emery criterion Ric(M) + Hess(^ / ) > 
pl m holds with a constant p > 0. Then, for every p G M.(M) absolutely continuous with 
respect to v one has 

2 

dp, (2.1) 
whenever the density dpjdv is smooth on M. 



S(p,v) < 



2p 



\ 


Vlog^ 


1 M 


av 
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Recall that the left-hand side of (2.1) is the relative entropy (1.1), while the integral in the 
right-hand side is nothing but the (classical) relative Fisher information of p relative to v. 
Motivated by and based on this theorem, the following "free LSI" was shown by Biane: 

Theorem 2.2. (Biane [4]) Assume that Q is a real-valued C 1 function on R such that 
Q(x) — ^x 2 is convex on R with a constant p > 0. Then, for every p G M(R) one has 

Sq(m) < ^QM- (2-2) 

Obviously, the above convexity assumption of Q is equivalent to Q"(x) > p on R as long 
as Q is a C 2 function. 

When Q(x) = px 2 /2 with p > 0, the relative free entropy £q(a*) is given as 

E Q {p) = -£(//) + P - J^x 2 dp{x)- -\ogp-- 

and its minimizer is the (0, l/p)-semicircular distribution 7o,2/^/p ( see (1-10))- Thus, in this 
special case, for any p G M(R) having the L 3 -density p and satisfying j- R x 2 dp(x) < +oo, 
the free LSI becomes 

- E( M ) + £j x 2 dp(x) - 1 log p - - A < 1 ($00 - 2p + p 2 jf z 2 ^)) (2.3) 
because of 2 J R ((Hp)(x))xp(x) dx = 1. Indeed, notice 

p(t) dt^) xp(x) (fx 

-2 



R V^R fa " *) 2 + e 2 



= /r P( ^/r( 1 + ^^ J P(x) ^^ 



e 2 



x) 2 + e 2 



p(x) cfe ] dt 



so that 



Ir ^kIr 



■ p(t) dt \ xp{x) dx 



(x - t) 2 + e 2 

= 1 ~L P{t) {L(t-x) 2 + e 2P{x) dx ) dt ■ 
Letting e \ gives 2 J (Hp(x))xp(x) dx = 1 as long as p G L 3 (R) (see [17, pp. 92-93]). The 
inequality (2.3) can be rewritten as 

x(/i) > -—$(//)- -log p+- log 27T + 1 

thanks to the formula (1.2). Maximizing the above right-hand side over p > gives Voiculescu's 
inequality ([31, Proposition 7.9]) 

(The last argument is contained in [5, §§7.2].) In this way, the free LSI in Theorem 2.2 for 
the functions Q(x) = px 2 /2 with p > is equivalent to the inequality (2.4). 

In [4, Theorem 3.1] Biane proved Theorem 2.2 when both Q and the density of p are 
sufficiently smooth, and the proof of the extension to the general case was omitted. It may 
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be also worth noting that Lemma 1.3 was implicitly used in [4]. The rest of this section is a 
supplement to Biane's proof, completing the proof of Theorem 2.2 
We need the following general technical lemma. 

Lemma 2.3. Let Q and Q k , k G N, be real-valued continuous functions on R satisfying the 
following two conditions: 

(a) Qk converges to Q uniformly in any finite interval ; 

(b) there exists a real-valued continuous function Q on R such that 



lim |x| exp ( — eQ(x) ) = for every e > 

|rr|-»+oc V / 

and Qk(x) > Q(x) for all k £ N (so Q(x) > Q(x)). 

Then, the B(Q k )'s and B(Q) are defined as finite real numbers (see §§1.4), and one has 
lim k ^ 00 B(Q k ) = B(Q). 

Proof. By the assumption (b) we can apply the large deviation theorem for self-adjoint ran- 
dom matrices associated to the given Q and the Q k s. Let /j,q and [XQ k be the equilibrium 
measures associated with Q and Q k , respectively, and R > is chosen so that /j,q is sup- 
ported in [— R, R]. For each e > 0, thanks to the assumption (a) we can choose ko so that 
\Qk( x ) — Q(x)\ < e for all x £ [-R, R] and for all k > ko. Then for k > ko we have 

B(Q) = B(Q;R) 

= lim \ log / exp ( -n V" Q (xj) | TT (xi - Xj) 2 TT dxi 

\ log / exp ( -n (Q k (xi) + e) ) TT (xi - xj) 2 TT dxi 

* J\-R,Rr \ ~1 Jt<j 7=1 

< e + lim inf — log f exp ( -n ^ Q k (x,) J JJ (xj - x-,-) 2 dx; 
°° n R ™ V i=l / i<j i=l 

= e + 5 (Q k ) 

so that .B(Q) < liminffc^oo B(Qk) since e is arbitrary. 

In what follows, we will apply some techniques used in [17, §5.5] and [14]. For a > define 

F(x,y) := -\og\x - y\ + -(Q(x) + Q(y)) , F a (x,y) := min {F(x, y), a} ; 
Fk(x,y) := -log|x - y\ + ^ (Qfc(ar) + Q fc (y)) , F k>a (x,y) := min{F k (x,y), a} . 

Note that the double integrals of F(x,y) and F k (x,y) with respect to /x € .M(R) are the 
weighted energy integrals Eq(/j,) and Eq k (/j,) associated with Q and Q k , respectively. Since 
the tightness of (^Q h ) can be shown as in the proof of [17, 5.5.3], a subsequence (fJ-Q k ^) can 
be chosen so that jJ.Q k(l) weakly converges to some jio G M(R) and 



< lim inf 



liminf // F k (x, y) d/j, Qk (x) d/j, Qk (y) = liminf (-B(Q k )). 

fe^OO J J p{2 fe^oo 
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As in the proof of [17, 5.5.2] it is seen that Fk t a(x,y) — > F a (x,y) uniformly as k — > oo for 
each a > 0. Hence, we have 

-B(Q) < J i f F(x, y) d/io(s) d/*o(l/) 



= sup// F a (x,y)dfj, (x)dfj, (y) 

a>0 J J R 2 

= sup lim // F k{l)j0l (x,y)dnQ k(l) (x)dn Qk(l) (y) 

a>0 '^co J J R2 



= liminf (-B(Q fc )), 

where the first inequality comes from that /xq is a minimizer of Eq([i) with —B(Q) = Eq([iq). 
Thus S(Q) > lim sup fc _ 00 5(g fc ) follows. ' ' □ 

Now, let us prove Theorem 2.2 for the general case. Assume that \x has the density 
p = dfi/dx € L 3 (R) and moreover that &q(h) = AJ ({H P )(x)-^Q'(x)Ydn(x) is finite. 
Since Hp £ I? (R, fj,) by the former assumption, the latter implies Q' G I? (R, yu.) as well. 

At first, suppose further that \i is compactly supported. For each e > choose a non- 
negative C°° function <j> £ supported in [— e, e] with J 4> £ (x)dx = 1, and consider the convo- 
lution Q £ := Q * (f> e . Then Q e 's are C°° functions, and Q £ — > Q and Q £ — > Q' uniformly on 
each finite interval as e \ 0. (The last assertion is seen because Q' £ = Q' * 4> e follows from 
the C 1 of Q.) The convexity assumption of Q means that 

XQ (si) + (1 - X)Q (x 2 ) - Q (Axi + (1 - A)x 2 ) > |a(1 - A) ( Xl - x 2 f 

for all xi,x 2 G R and < A < 1. This implies the same convexity of Q £ so that Q £ (x) > p 
for all x £ R. Define p £ := p * (fr £ and |U £ G A'f(R) by dp £ {x) := p £ (x)dx. Moreover, 
consider Q fJle (x) := 2 J R log|x — y\dfi £ (y), which is a C°° function on R. Then we have 
Q'n e ( x ) = 2(Hp £ )(x) for a.e. x 6 R (see the proof of Lemma 3.2 (i) in §3). Hence, the proof 
of Theorem 2.2 in [4] implies that 

XqM<±*qM fore>0. (2.5) 

Since the convexity assumption of Q implies that Q £ (x) > ax 2 + b for some a > and b G R, 
Lemma 2.3 gives 

lim B(Q £ ) = B(Q). 

Furthermore, notice that \\p £ — p||i 3 ~^ an d hence \\Hp £ — Hp\\ L s — > as e \ so that we 
get 

lim / Q £ (x) dfx £ (x) = / Q(x)d/j,(x), 

lrni^^ £ )(x)-^(x)) d Me (x)=j^((fTp)(x)-iQ'(x)) c^(x). 

From (2.5) and the above convergences together with the upper semicontinuity of £(//) (see 
[17, 5.3.2]) we have 

< limiiif £ Qe (/i e ) < Iim^-*Q e (/i e ) = ^-$q(//). 
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Next, let us treat the case where fx is not compactly supported. For R > set dfi R (x) := 
^[-r,r}) X[-r,r](x) dfi{x), whose density is given by p R := j^^Rjj X[-R,R] P- Then, \\p R - 
p\\l' a an d \\Hpn — Hp\\ L 3 — > as R — > +oo so that 

lim / ({Hp R )(x)) 2 p R {x)dx = I {{Hp)(x)) 2 p(x) dx 

R-^+oo J R J n 

and 

/ (Q'(x)) 2 \pr(x) -p{x)\ dx 

< / {Q'(x)) 2 p(x)dx+( 1 -l) f (Q'(x)) 2 p(x)dx 

Jk\[-r,r] \/nl - - K >- K JJ / Jn 

— > as R — ► +oo. 
Furthermore, we have 

/ (Hp R )(x)Q'(x)dfi R (x)- f (Hp)(x)Q'(x)dfi(x) 

JR 

2 ^ X /2 !/ 2 



P{x)dX ) (j^Q'^M^dx^) 
j^{(Hp R )(x)-{Hp)(x)) 2 p{x)dx^j ' (J^Q'(xfp(x)dx^ 

f ({Hp R )(x)) 2 p{x)dx+( 1 -l) / ((Hp R )(x)) 2 p(x)dx\ 

R\[-R,R] V^li-^-KJJ / J 

X Uq'{x) 2 p{x) 

+ Q \{Hp R ){x) - (Hp)(x)\ 3 dx^j ' (yj p{xfdx^j 1 (J Q'{x) 2 p{x)dx^j 



1/2 



. 1 ' 2 

• dx 



\ V 3 / r \ V 6 / r \ V 2 



— ► as R — > +oo. 

In the above, the first inequality is obtained by the Cauchy-Schwarz inequality with respect 
to dfi(x) = p{x)dx and the second one is by the Holder inequality with respect to dx. From 
the above convergences we get 

lim $ q (li q ) = <S> q (»). (2.6) 

_R— >+oo 

On the other hand, we get 

£q(m) < liminf T> Q (fi R ) (2.7) 

thanks to the monotone convergence theorem and the upper semicontinuity of There- 
fore, the desired inequality follows from (2.6), (2.7) and the first case of fx being compactly 
supported. □ 

3. Free LSI for measures on T 

In this section we will proceed to the free analog of logarithmic Sobolev inequalities for 
measures on T. The idea here is essentially same as Biane's work [4] mentioned in §2. Namely, 
the free analog arises as the scaling limit in the scale 1/n 2 of the classical one (2.1) on the 
special unitary group SU(n). However, there is an essential difference between his argument 
and ours; we need full power of large deviation principle (especially the weak convergence 
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of the empirical eigenvalue distribution to the equilibrium measure almost surely), while the 
weak convergence of the mean eigenvalue distribution is enough in the proof of [4, Theorem 
3.1]. 

Let us start with some lemmas. 

Lemma 3.1. Let Q be a harmonic function on a neighborhood of the unit disk 
{( G C : |£| < 1}. For each n G N and each U G SU(n) define Q(U) via the functional 
calculus and set ^(U) := Tr n (Q(U)). Then one has 

(i) The function V(U) on SU(ra) is C°°. 

(ii) Vtt(tf) = V=1(Q'(U) - lTr n (Q'(C/))/ n ). 

(hi) If <2^e v/ ~ T '^ — ^t 2 is convex on R for some constant p > 0, then Hess(^) > pl n 2_i. 

Proof Set f(t) := QjV 3 ") for t G R, and let Y k := ^\X k with X k = X*, 1 < k < n 2 - 1, 

be a basis of the Lie algebra su(n) = {T G M n (C) : T + T* = 0, Tr„(T) = 0} R"^ 1 ). 
For any U = e^ 1 ^ G SU(n) with \/— 1A G su(n) and for x = (xi, . . . , x n 2_ 1 ) G R" 2 " 1 , we 
write 

* ^exp ^v^A + j£ a*lfcj = Tr n ^/ ^A + j£ j . 

The C°° of / on R immediately follows from the assumption of Q. In fact, for each to £ R- ; 
the function /(to +t) has a power series expansion for t near 0. Hence, thanks to Lemma 1.3 
we have (i) and 

n 2 -l 

V*(E/ ) = E Tr n (/'(^ )n)n 
fc=i 

n 2 -l 



£ Tr n ((/'(A) - ^Tv n (f'(A ))I n ^Y k ^Y k 

£ ^y^i (V(A) - ^„(/'(A)))/n) , n 

^(/'(A))-^LVn(/'(A)Kn) 

(q'(£/ ) - ^IV„(Q'(C/o))^ , 



implying (ii). 

Set F(t) := Q^ev^ 1 *) - £t 2 for t G R. For any U = e ^ Aa G SU(n) with v^A G su(n) 
and for (xi, . . . , x n 2_ ± ) G R" 2 " 1 , we have 

* ^exp (^1A + x k Y k ^j 

/ / n 2 -l \ \ / / n 2 -l \ 2 \ 



= Tr„ F A) + x k X k + ^Tr n A + ^ x k X k 



\ \ k=l / / \ \ k=l / / 

/ / n 2 -l \ \ n 2 -l n 2 -l 

Tr„ If A + XkXk + ^n( A l) +pY1 T^n(A X k )x k + P - ^ x\. 

V V k=l J ) k=l k=l 
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Since F(t) is convex on R, it is known ([24, 3.1]) that Tr n (F(Ao + Ylk=i x k^k)) is convex 
in (xi, . . . , x n 2_ 1 ) so that (Hi) follows. □ 

Lemma 3.2. Assume that fj, £ M(T) has a continuous density p = d^i/dC, and that Q M (C) := 
2 J T log |£ — r/\ dfi(rj) is C 1 on T. Then one has 

(i) Q'^0 = (H P )(Ofora.e.(eT; 

(ii) f T ((Hp)(O)p(()d( = 0. 



Proof, (i) Let / be an arbitrary C 1 function on T. Then we have 

r2w - - - - - dB_ 

2^ 

10 



= — lim / 2 log 

£ \° J\e-t\>e 

r>27T 



d_ 

~dB- 



f(e^- T9 )p(e 



-it 



dQ x dt 



I * (L>. log (2(1 " cos( * - ^ Te^ W ) t)^) t 



i\e-t\ 

where the second equality is due to the fact that log 
above. Integrating by parts we get 

/ ^ 1^(2(1 -»(»-«))) |/(«^)| 



wf{ eV ^ 6 ) is bounded 



log (2(1 - cose)) 
2^ 



(/(^-))-/(^<-»))-/_ 



-16 



dQ 



\e-t\>e tan(^) 2vr' 



and hence 



= lim< 

e\0 



log (2(1 - cose)) 



2tt 

2tt 



2vr 

p2tt 



+ 



f{^ 16 ) dQ 

o \J\d-t\>e tan (2=*) 2vr 



/) I l\6 



P 



(e^) 



/=T(t-e) 

2^ | 



p e 



-it 



dt 
27 



lim 

e\0 



So ( X- 



P e 



-it 



dt 



'\e-t\>e tan (V) 27r 

= f(Hrt(.^)/(.^») 



27T" 



In the above, the second equality comes from /^e vr ^^* +£ ^ — f^e^^^ e ^ = O(e) uni- 
formly for i £ [0, 2tt), and since we have in particular p € L 2 (T), the last one does from the 
L 2 -convergence of the involved principle value integral to Hp (see [11, 12.8.2 (2)]). Hence, 
the desired assertion follows since / is arbitrary. 
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(ii) is seen by taking the limit as e \ of 




-if 



p\ e ) dt\ t .^ e \ de 



\t-e\>e tan (^) 2vr I V / 2vr 



p(e 




pie 



W ) dA / ynA dt_ 



l\e-t\>e tan 

thanks to the L 2 -convergence of the principle value integral as mentioned above. □ 

Theorem 3.3. Let Q be a real-valued C 1 function on T such that Q^e'^^^j — ^t 2 is convex 
on R with a constant p > —1/2. Then, for every p £ A4(T) one has 

SqM < TTTp^- (3 - 1} 

In the special case where Q = and p = 0, the above (3.1) becomes 

-E( M ) < F(p) 

and the equilibrium measure pq is the uniform distribution dQ. 
In particular, the theorem implies that Fq(p) > 0; that is, 

2 



J ((H P )(0 - Q'(C)) 2 dp(() > Qf Q'(C)dMC) 



for every p G M(T) under the above assumption of Q. Also, suppose that the equilibrium 
measure pq has a continuous density and its support is T; then we have Q(C) = 2 J T log \( — 
rj\ dpQ{rj) for all C, G T due to [26, Theorem 1.3.1] so that Lemma 3.2 gives Fq(pq) = 0. 

Proof of Theorem 3.3. First, let us assume: 

(a) Q is harmonic on a neighborhood of the unit disk; 

(b) p has a continuous density p = dp/dC,, and Q^(C) := 2 f T log |£ — ??| d//(ry) is harmonic 
on a neighborhood of the unit disk. 

For each n G N define n x n special unitary random matrices A^ U (Q) and A^ U (Q M ) as in 
(1.16), i.e., 

dAS u (Q)([/) := ^l^eM-nTr n (Q(U)))dU, 

dX^iQ^U) := z g U 1 (Q ^ ) exp(-nTr ra (Q M (^)))d[/. 

Let A^ U (Q) and A^ U (Q M ) be their joint eigenvalue distributions on T n_1 . Also, let A^ U (Q) 
and A^ U (Q M ) be their mean eigenvalue distributions (see §§1.6). According to Theorem 1.2, 
the empirical eigenvalue distribution of A^ U (Q^) satisfies the large deviation principle in the 
scale 1/n 2 whose rate functions is Sq m (^). Moreover, note ([26, Theorem 1.3.1]) that the 
equilibrium measure associated with (or the minimizer of £q m ) is the given p. This large 
deviation principle guarantees the following facts (i) and (ii), which will be the key ingredients 
in our arguments below. 

(i) \n(Qn) I 1 weakly as n — > oo; 

(ii) the empirical distribution ^ (£i + • • • + Cn) weakly converges to p almost surely as n —> 
oo when (Ci, . . . , Cn-i) is distributed according to Xf^iQ^) and Cn = (Ci " " " Cn-i) -1 - 
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Set V n (U) := nTr„ (Q(U)) for U G SU(n). Lemma 3.1 (iii) and (1.19) verify the Bakry and 
Emery criterion: 

Ric(SU(n)) + Hess(^ n ) > Q + np) 7 n 2_ 1 . (3.2) 
Thus, by Theorem 2.1 due to Bakry and Emery we get 



Vlog 



dX^(Q) 



tfs (3.3) 



S{X^{Q,)^{Q))< 9( n\ J 

2 (2 +np) y su(n) 

Notice 

d\ s ^(0 1 7 slJ ((l) 

= ~^-exp(-nTr„(Q M (C/)) +nTr n (Q(C/))), £/ G SU(n), 
aA n ((^J ^ U (<5 M ) (3.4) 

where Z^ U (Q) and Z^ U (Q^) are the normalization constants of the joint eigenvalue distribu- 
tions (see §§1.6). Hence, we have 

^S(X^(Q,),X^(Q)) 



1 . : 



'SU(ra) a!A„ ((7) 



= _log^ u (Q)-- f log^ u (Q / ,) 



-Tr^Q^C/)) ^(Q^C/) + / -TV n (Q([/)) ^ U (Q M )(C/) 

SU(n) n JSU(n) n 

= -llog^u (Q) __L log ^su (Q/i) 

- / Qm(0^ u (Q m )(C)+ / Q(C)^ U (Q M )(C), 
and therefore, thanks to (b) and (i) above, 
hm ^(X^iQ^X^iQ)) 

= B(Q) - B(Q IX ) - [ Q M (C) dn(Q + / Q(Q d M (C) = £q(m), (3.5) 
where the last equality comes from that fx is the minimizer with Xq m (u) = 0, i.e., 



L 



IT 

Therefore, the scaling limit in the scale 1/n 2 of the left-hand side of (3.3) becomes the relative 
free entropy £q(/x). We will seek for the scaling limit in the scale 1/n 2 of the right-hand side 
of (3.3). By (3.4) and Lemma 3.1 (ii), we have 



Vl ° g d dX% { (Q) {U) = - nV ( T ^(Q,(U)) - Tr n (Q(U))) 



= -^T{n(Q^(C/) - Q'(U)) - {Tr n {Q'^U) - Q'{U))) I n ) 
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so that 



Vlog 



dXl v (Q,) {u) 



HS 



dX^(Q) 

= n 2 Tr n ((Q'^U) - Q'{U)f) - n(Tr n (^(C7) - Q'{U)))' 



Thus, we get 
1 



1 



n 2 2 (f + np) Jsv(n) 
1 



^Tr n ((g;(C/)-Q'(^)) 2 ) dX^(Q^)(U) 

- I hi^KV) -Q'{u))) 2 d\^{Q,){u) 

JSV(n) n V ' 

The above-mentioned fact (i) implies that 

/ -TrJ(Q'^U) - Q'(U)f) dX^(Q,){U) 

= / K(0-Q'(C)) 2 d\l v (Q,)(0 
Jt 

— / (Q^(C)-Q'(C)) 2 dMC) asrwoo, 
while the above fact (ii) does that 

/ i K(<W - Q'^))) 2 ^^)^) 

= / (^E(W-qW) ^ U (Q^)(d,...,Cn-l) 

jTn-i yn , =i y 

with Cn := (Ci---Cn-i)~ 
— (Q^(C) - Q'(0) <*MC)) as rw oc 
Thanks to the assumption (b), Lemma 3.2 implies that 

'J t (Q^(C)-Q'(C))^(C)) 2 = (jf ((^p)(C))p(C)dC- J T Cf{QMQ 

Q'iOdKO 



1 + 2 P I JsU(n) 



so that we get 



lim — • — t r- / 

i*-oo n 2 2 (f + np) 7sU(n) 



Vlog^V) 2 



dX^(Q) 



dX^(Q,)(U) = y^-FqM 



HS 



+ 2p 



(3.6) 



By (3.3), (3.5) and (3.6) we have shown the desired inequality (3.1) under the assumptions 
(a) and (b). 
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Next, let us deal with a general Q as stated in the theorem. Let fx G Ai(T) with a density 
p = dfi/d( G L 3 (T). For each < r < 1, we consider the Poisson integrals Q r and p r of Q 
and p, respectively; that is, 

r 2tt 
'0 

' dt 



-It 



with the Poisson kernel P r (0) := (1 - r 2 )/(l - 2r cos 6* + r 2 ). Define ^ r G M(T) by d/x r (C) := 
p r (()d(. Then it is plain to see that Q r satisfies the assumption (a) and that \i T does (b). 
The convexity assumption of Q in the theorem means that 

\Q(e^ s ) + (1 - A)Q(e^) - > |A(1 - A)(t - s) 2 

for all s, t G R and < A < 1. It is easy to check that each Q r , < r < 1, satisfies the same 
convexity assumption so that 

^qM < T^Y p F Qr(v) ( 3 - 7 ) 

by what we have already shown. It is known (see [16] and also [17, p. 224]) that \i r — > [i weakly 
and £ (pt r ) — > as r /* 1. Moreover, it is known (see [19, 5.3.2]) that ||Q r — QH^ — > as 
r/1, where || • ||oo means the uniform norm on C(T). Since it is easily seen that 



1 log Z n (Q r )-\ log Z n (Q) 



9 ^n\^xr j o 



< WQr-Ql 



we have B(Q r ) — > £?(Q) as r / 1. Therefore, we get 



r/1 



Notice that ||p r — p\\ L 3 — ► and hence ||i?p r — ^p||^3 — > as r /* 1. Since Q is a C 1 function, 
Q' r becomes the Poisson integral of Q' so that \\Q' r — Q'W^ — > as r /* 1 as well. These 
imply that 

2"! 



IimF 0r ( M ) = limi / {(Hp r )(0-QUO) 2 d» r (0-( [ Q'AC) d^(C) 

= jf ((F P )(C) - Q'(C)) 2 ^(C) - (jf Q'(C)dMC)) 2 = *q(m)- 
Hence, the desired inequality (3.1) follows by taking the limit of (3.7). □ 



4. Free TCI for measures on R 

The second aim of this paper is to obtain the free analog of transportation cost inequalities 
for measures on R and on T. We deal with probability measures on R in this section and 
those on T in the next section. The (classical) transportation cost inequalities compare the 
Wasserstein distance with the relative entropy (see (1.1)) for two probability measures. Let 
us first recall the definition of the Wasserstein distance. Let X be a Polish space with a 
metric d. The (quadratic) Wasserstein distance between fi, v G M.(X) is defined by 

W(ji,v):= inf J If l -d{x,yYd^{x,y), (4.1) 
Tren(^) V J J XxX 2 
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where n(/x, v ) denotes the set of all probability measures on^xA' with marginals p and v, 
i.e., 7r( ■ x X) = p and n(X x ■ ) = v. The Wasserstein distance is sometimes defined with the 
integral of d(x,y) 2 instead of ^d(x,y) 2 . The next lemma is well known and easy to show. 

Lemma 4.1. W(p, v) is weakly lower semicontinuous in p,v £ M.(X); namely, if p n ,v n £ 
M.(X), p n — > p and u n — > v in the weak topology, then 

W(p,u) < liminf W(p n ,u n ). 

In the typical case where X = R n and d(x,y) = \\x — y\\, the usual Euclidean metric, 
let g n be the standard Gaussian measure, i.e., dg n {x) := {2Ti)- n l 2 e-W x W 2 / 2 dx (dx means the 
Lebesgue measure on R ra ). The celebrated transportation cost inequality (TCI for short) of 
Talagrand [28] is 

W(p,g n ) < y/S(fi,g n ), p £ M(R n ). 
This inequality is a bit extended as follows (see [21]): 

Theorem 4.2. Let ^> : R n — ► R and assume that ^(x) — f ||x|| 2 is convex on R n with a 
constant p > 0. If dv(x) := -^e - *^ dx £ M(R n ) with a normalization constant Z, then 

W(p,u) < yj-S^v), n £ M(R n ). 

In [25] Otto and Villani established the interrelation between LSI and TCI by a technique 
using partial differential equations. Their result, combined with Bakry and Emery's LSI ([1] 
or Theorem 2.1), implies the following TCI in a setup on Riemannian manifolds, which will 
play a crucial role in deriving our free analog of TCI for measures on T. In the theorem, let 
M be an m-dimensional smooth complete Riemannian manifold equipped with the geodesic 
distance d(x, y) and the volume measure dx. 

Theorem 4.3. (Bakry and Emery [1] and Otto and Villani [25]) Let ^ be a real-valued C 2 
function on M and set dv(x) := -^e - *^ dx £ M.(M) with a normalization constant Z . If 
the Bakry and Emery criterion Ric(M) + Hess^) > pl m holds with a constant p > 0, then 



On the other hand, the following free analog of Talagrand's TCI is shown by Biane and 
Voiculescu [6]. Recall that 70,2 is the standard semicircular measure (see (1.10)). 

Theorem 4.4. (Biane and Voiculescu [6]) For every compactly supported p £ A^(R), 



W(p, 70,2) < y -£00 + J y dp(x) - I (4.2) 

In the rest of this section we will present a new proof of the above free TCI in a more 
general situation by using a random matrix technique. In fact, the classical TCI on the 
matrix space M^ a asymptotically approaches to the free analog when the matrix size goes to 
00. The following is our free TCI for probability measures on R, where the relative entropy 
in the classical TCI is replaced by the relative free entropy (1.14). 

Theorem 4.5. Let Q be a real-valued function on R. If Q[x) — ^x 2 is convex on R with a 
constant p > 0, then 



W(p,p Q )<J-^ Q (p) (4.3) 



for every compactly supported p £ M(R). 
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In particular, when Q(x) = x 2 /2 and so p = 1, the relative free entropy Sq(/x) is the inside 
of the square root in (4.2) and its minimizer is 70,2 so that Theorem 4.5 is a generalization 
of Theorem 4.4. 

The next lemma will play a key role in our proof of the theorem. 

Lemma 4.6. Let fl, v G .M(M* a ) and /t, v be the mean eigenvalue distributions on R of fx, v , 
respectively. Then 

W{fx,u) < -Lw(fl,v), 

where W(ji, v) is the Wasserstein distance with respect to the distance induced by the Hilbert- 
Schmidt norm \\ ■ \\hs on Af* a . 

Proof. For A G M^ a let Xi(A), . . . , X n (A) be the eigenvalues of A in increasing order with 
counting multiplicities. The mean eigenvalue distribution fx is written as 

JM° a n 

For each tt G U(jl, v) define tt G M(R x R) by 

KG) ■= [[ - #{» : (Ai(A), Ai(B)) G G} d7r(A, B) 

JJ M^ a xM^ a n 

for Borel sets G C R x R. Since 

tt(F x R) = / 1 : Ai(A) G F} d/i(A) = fl(F) 
j M sa n 

and similarly 7r(R xF) = ^(F) for F C R, we get ir G n(/x, z>) so that 
W(fi,u) 2 < [[ \{x-yfdiT{x,y) 

J J RxR A 

= [[ Iff \{x-y?d(lj^8 XM) ®5 M{B MdKA,B) 

J J M^xM™ [J J RxR 1 \ i=l /) 

= ~ II \jZ{K(A)-\{B)fdKA,B). 

n J J M° a xM° a z i=1 

The famous Lidskii-Wielandt majorization for Hermitian matrices (see [3]) implies that 

n n 

J2iHA) - \i(B)) 2 < HA -Bf = \\A- B\\l s 

i=l i=l 

for all A,B G M™. Therefore, 

W(fi,u) 2 <- [[ h\A-B\\ 2 HS dit{A,B), 

n J J M^ a x M° a z 

and taking the infimum over tt G n(/i, v) gives W(fi, u) 2 < -W(fi, v) 2 . □ 

Proof of Theorem 4-5. First, let jjl G A4(R) be compactly supported, and suppose that the 
function Q^x) := 2 J log \x — y\ dfi(y) is finite and continuous on the whole R. Choose R > 
so that fi is supported in [— R, R]. For each n G N consider the n x n self-adjoint random 
matrix X n (Q^,R) G A4(M* a ) supported in {A G M° a : \\A\loo < R} as well as X n (Q) G 
M(M* a ) (see §§1.4 and §§1.5). Here, note that the condition (1.8) is automatically satisfied 
under the convexity assumption of Q. Since the corresponding large deviation principle 
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guarantees the weak convergence of the mean eigenvalue distribution X n (Q) (resp. X n (Q^; R)) 
to hq (resp. fi), Lemma 4.1 gives 

W(fi,HQ) < lhninf W(X n (Q^R),X n (Q)). (4.4) 

By Lemma 4.6 we get 

W{X n (Q^ R), X n (Q)) < -^=W(X n (Q^; R), X n (Q)) . (4.5) 



Set V n (A) := n£x n {Q(A)) for A G M* a ; then dX n (Q)(A) = z^Q) e ~^ n(A) dA - Since Q( x ) ~ 
^x 2 is convex on R, so is 

9 n {A) - ^MH^ = nlt B (0(A) - ^A 2 ) on M~ 

2 

Also, note that || • \\hs corresponds to the Euclidean norm on R n under the isometry A = 
[Aij] G M* a i ^ ((Aii)i<i<n, (V2Aij)i<j) G R" 2 - Hence, Theorem 4.2 implies that 



W(A n (Q M ; fl), A n (Q)) < J — 5(A n (Q M ; i?), A n (Q)) . (4.6) 

V P n 

Similarly to the case of special unitary random matrices in the proof of Theorem 3.3, since 
dX n {Q,-R) {A) = Zn(Q) exp{ _ nTln{QM)) +nTln{Q{A)) ) 



dXn(Q) Z n (Q^R) 
on (M* ) fl := {A G M s n a : < R}, we have 

1 



n 2 



S{X n (Q ll ;R),X n (Q)) 

= ^ log Z n (Q) - log Z n (Q^) - [ -Tr n (Q /1 (A))dA„(Q M ;i?)(A) 

n n J{M^ a ) R n 

+ I -Ti n (Q(A))dX n (Q^,R)(A) 

— B(Q) - i?) - / Q„(s) ^(x) + / Q(x) dfi(x) = £ Q (/x) (4.7) 

thanks to the fact that \x is the minimizer of the rate function (1.15) with in place of Q. 
Combining (4.4)— (4.7) altogether implies the inequality (4.3) under the continuity assumption 
of Q^x). 

Finally, let fi G Ai(R) be a general compactly supported measure. By the regularization 
method in [17, p. 216] we can choose a sequence {^k} of measures in Ai(R) with compact 
supports uniformly bounded such that Q^ k (x) is continuous on R for each k, fik ~~ * A* weakly 
and S(/ifc) > S(/u) for all fc. Hence, by Lemma 4.1 and the first case we have 

W(h,/j,q) < liminf W{fi k ,HQ) 

n— >oo 

< liminf W-Sq^) < J -Zq(h), 
ft— >oo VP VP 

completing the proof. □ 
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5. Free TCI for measures on T 

In this section we will present the free analog of transportation cost inequalities for mea- 
sures on T. The idea with use of special unitary random matrices is the same as before. In 
the following we consider two kinds of Wasserstein distances between probability measures 
p, v G .M(T). The one is the Wasserstein distance with respect to the usual metric \( — r]\, 
C,7] G T, and the other is with respect to the geodesic distance (i.e., the angular distance) 
on T. We write W\.\(p, u) for the former and W(p, v) for the latter. Of course, one has 

W\.\(ji,u)<W(jJL,u), p,ueM(T). (5.1) 

The next theorem is the free TCI for measures on T comparing the Wasserstein distance 
with the relative free entropy (1.17). 

Theorem 5.1. Let Q be a real- valued function on T. If there exists a constant p > — \ such 
that Q(e y f~^ t ) — ^t 2 is convex on R ; then 

W\.\ (p, p Q ) < W(ji, p Q ) < y^-^-EgC/i) (5.2) 

for every p G M(T). 

The special case where Q = and p = is 

W^p,^ <w(p,^ < y^zW), peM(T). 

We need the next lemma to prove the theorem. Note that the lemma and the proof remain 
valid when SU(n) is replaced by U(n). 

Lemma 5.2. Let p, v G M(SXJ(n)) and W(p,v) be the Wasserstein distance between p,i) 
with respect to the geodesic distance on SU(n). Let p,u be the mean eigenvalue distributions 
on T of p, v, respectively. Then 

W(p,v) < J-W{fi,u). 

Proof. We use the symbol d for the geodesic distance on SU(n) as well as for that on T. 
Define the optimal matching distance on T n by 



n 

i=i 

for C = (Ci, •••,(«), = (r?i,...,r?„) G T n . For U G SU(n) let X(U) := (X^U), . . . , X n (U)) 
denote the element of T n consisting of the eigenvalues of U with multiplicities and in counter- 
clockwise order (i.e., < argAi({7) < ■ ■ ■ < argA„(?7) < 2ir). First, we prove 

S(X(U),X(V))<d(U,V), U,V€S\J(n). (5.3) 

For U, V G SU(n) let U(t) (0 < t < 1) be the geodesic curve in SU(n) connecting U and V. 
By dividing the curve into several small pieces if necessary, we may assume that there is a 
smooth curve A(t) (0 < t < 1) in {A G M T f : Ti n (A) = 0} such that U(t) = e ^ A ^ for 



5{(,tj) := min 
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< t < 1. Let = to < t± < • • • < tx = 1 be any partition of A(t). For 1 < k < K we have 

< n _ „ 1 1/2 

<S(A(tf(t fc -i)),A(l7(t fc ))) < J^d(e^^-i)),e^(^) 



c )) x2 



n 



1 1/2 

< l^2\UA(tk-i))-K(A(t k ))\ 2 \ 

< \\A(t k -i) - A(t k )\\ H s 

= d(U(t k - 1 ),U(tk))+o(t k -t k - 1 ). 

In the above, Xi(A k ), . . . , X n (A k ) are the eigenvalues of A k in increasing order, and the third 
inequality is due to the Lidskii-Wielandt majorization. Therefore, 

K 

5(X(U), X(V)) < S(X(U(t k ^)), X(U(t k ))) < d(U, V) + 

k=l 

so that (5.3) follows because o(l) — > as max k (t k — t k -i) — > 0. 
Now, for each (7,7 6 SU(n) let cr^y G 5 n be such that 

r n ^ !/ 2 

<5(A(t/),A(V)) = |^d(A,([/),A CT[/v , (i) (y)) 2 | . 

Of course, we can let (U, V) G SU(n) x SU(n) i— ► <7[/,y G S n measurable. For every fl, v G 
A4(SU(n)) and tt G II(/i, £>), define tt G A4(T x T) by' 

tt(G):= // -#{i:(A,(C/),A CT[ , vW (V))GG}d7f(C/,F) 

J J SU(n)xSU(n) n 

for Borel sets GcTxT. Since for F C T 



tt(F x T) = / 1 #{i : Ai(l7) G F} d£(£7) = p,(F), 
JSV(n) n 

vr(T x F) = [ - #{i : X t (V) G F} dD{V) = u(F), 
JSV(n) n 

we have n G n(/t, v) so that 

^(A,^) 2 < // ^(Cr?) 2 ^,^) 

J J TxT ^ 

= 111 IjrdiuuiKu^iv)) 2 dmv) 

n J J SU(n)xSU(n) ^ ~[ 

= \\\ l -5{X{U),X{V)) 2 dt{U,V) 

n J J SU(n)xSU(n) 1 

< - If U(U,V) 2 dif(U,V) 

n J J SU(n)xSU(n) z 

thanks to (5.3). This implies W{(i,i>) 2 < \W(Ji,v) 2 . □ 

Proof of Theorem 5.1. The first inequality of (5.2) is obvious as noted in (5.1). To prove the 
second, we first assume: 

(a) Q is harmonic on a neighborhood of the unit disk; 

(b) the function Q M (C) := 2 f T log \( — rj\ d^{r]) is finite and continuous on T. 
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For each n G N define A^ U (Q), A^ U (Q M ) and A^ U (Q), A^Q^) as in the proof of Theorem 
3.3. Since A^ U (Q) — > and A^ U (Q M ) — > weakly, Lemma 4.1 implies that 

< hminf W{X^(Q^)^{Q)). (5.4) 

n— >oo 

On the other hand, Lemma 5.2 gives 

W(Af (Q M ),Af (Q)) < _L ^(Af (Q M ),Af (Q)). (5.5) 



Furthermore, since the function Vl/ n (?7) := nTr n (Q(C/)) on SU(n) satisfies the Bakry and 
Emery criterion (3.2), Theorem 4.3 implies that 



W(Xl v (Q,),Xl V (Q)) < ^J^^SiX^iQ^X^iQ))- (5.6) 

The above (5.4)-(5.6) and (3.5) (see also Proposition 6.1 (1) in §6) altogether prove the second 
inequality of (5.2) under assumptions (a) and (b). 

Next, let Q be as stated in the theorem (hence Q is continuous on T) and fi G M(T) be 
general. For < r < 1 let the Poisson integrals Q r , p r and fi r be as in the proof of Theorem 
3.3. Since Q r and fj, r satisfy (a) and (b) above, the case already shown implies that 



W(^,HQ r ) < ]/Y^-VqM- (5.7) 

Moreover, as in the proof of Theorem 3.3, we have \\Q r — Q\\ — > 0, B(Q r ) — > B(Q) and 
^Qr(Mr) ^q(aO as r / 1. Choose any sequence < r(k) < 1 with r(k) — > 1 such that 
^Qr(k) M(T) weakly. By the upper semicontinuity of we get 

< Eq(^o) < lirninf S Qr(fc) (/x Qr(fc) ) = 

so that jUo = /xq. This shows that /xq,, — ► /xq weakly asr/1 and 

W(M>MQ) < liminf W(/x r , /xqJ 

thanks to Lemma 4.1. Hence, the desired inequality finally follows by taking the limit of 
(5.7). ' * □ 

6. Concluding remarks 
In this section we collect some remarks, examples and supplementary results. 

6.1. Use of special orthogonal random matrices. For a real-valued continuous function 
Q, an n x n special orthogonal random matrix A^°(<5) is defined by 

d\l°{Q){V) := ^l^exp(-^Tr n (Q(y))) dV, 

where dV is the Haar probability measure on the special orthogonal group SO(n). The joint 
eigenvalue distribution on T n_1 of Xf l °(Q) is 

d~X s n ° (Q)(Ci,...,Cn-i)= exp(-^Q(Ci)) II \(i-(j\fl d ti 

\Q) \ i=l / l<i<j<n i=l 

With Cn = (Cl ■ ■ -Cn-l) -1 - 
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The large deviation is analogous to Theorem 1.2; the rate function is just ^£q(/u) and its 
minimizer is the same /xq. On the other hand, note that the Ricci curvature tensor of SO(n) 
is 

n — 2 

Ric(SO(n)) = — J n ( n _i)/ 2 , 
and the Bakry and Emery criterion in place of (3.2) is 

(77 — 2 77- \ 
Ai(n-l)/2i 

where \l/ n (V) := 7|Tr n (Q(V)) for F £ SO(n). In this way, a special orthogonal random 
matrix model can be used as well to obtain the free LSI in Theorem 3.3 and the free TCI in 
Theorem 5.1. Similarly, the free TCI in Theorem 4.5 can be shown by using a real symmetric 
random matrix model 

d\T\Q)(T) ; = _l_^exp(-^Tr n (g(T))) dT, 
where dT := Y\i<j dTij on M n (R) sa R«(«+i)/2. 

6.2. Some computations. Let Q(x) := px 2 /2 on R with p > 0. The equilibrium measure 
associated with Q is the semicircular measure Jo^/^/p- For a > we compute 

Sq(7o,2/^) = 7^ log a + - y^logp- ^, 
*q(7o,: 



a 
Since 

,. S q(7o,2/v^ 

lim 



we notice that the bound 1/2/9 in the free LSI (2.2) cannot be smaller than 1/4/?; however it 
is unknown whether 1/2/9 is the best possible bound or not. 

For 2 < A < oo the equilibrium measure associated with Q(Q '■= — (2/A)Re£ on T is 

/ 2 \ d9 . . , d0. ,„.,, 

i/ A := I 1 + - cos0 1 — (with z^oo = — ) (6.1) 

andS(^ A ) = -1/A 2 (see [17, 5.3.10]). When 4 < A < oo, since QjV 3 ") +\t 2 = § (f-cost) 

is convex on R, the free LSI (3.1) holds with 1/(1 + 2p) = A/(A — 4). For example, for 
2 < a < oo we compute 

/ 1 1\ 2 / 1 1\ 2 

E «<""> = (« - a) • f «<"»» = 2 («-a) ■ 

Again, the optimality of the bound 1/(1 + 2p) in (3.1) is unknown. 

Concerning the free TCI, it does not seem easy to exactly compute the Wasserstein dis- 
tance; in fact, we do not know the exact value of W(7o,n>7o,r2) f° r instance. 
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6.3. Classical TCI vs. free TCI. Both classical and free TCI's are formulated in terms 
of the same (quadratic) Wasserstein distance for measures, and thus it seems interesting to 
compare these two. However, in the case of measures on R, the natural reference measures 
are Gaussian (not being compactly supported) in the classical case, while semicircular (being 
compactly supported) in the free case, and hence the question is irrelevant in this case. In 
the case of the uniform probability measure d9/2ir on T, our free TCI is 

W(»,^<y/=22(fi, fi € M(T), 
while to the authors' best knowledge the sharpest classical TCI is 



(The latter inequality is seen as follows. It is known (see [21, p. 94]) that the "spectral gap" 
and "logarithmic Sobolev constant" are the same number 1, and [25, Theorem 1] implies 
the desired inequality.) Now, if the relative free entropy happens to dominate the (usual) 
relative entropy up to a positive constant, then a free TCI would immediately follow from 
the classical one. However, this is not, and we indeed have the following examples: 

(1) For an arbitrary k € N and for large n 6 N, let us choose k disjoint intervals 
[aj(n),bj(n)], 1 < j < k, in T = [0, 2ir) whose lengths are all 2ir/kn and whose center 
points are fixed independently of the choice n. Consider /Ufc(n) G M(T) whose density is 
T l k j=i n Xla J (n),b J (n)]- Then we have 

S^k(n), = logn. 

On the other hand, by a straightforward computation we see that, for a sufficiently large 
no € N, there are constants c\. < Cfc depending only on k such that 

Cfc H -j— < -S (Mfc(n)j < Gfc H — lor n > n , 



and thus 



-S(//fe(n)) 1 

• — as n —> oo. 



S(^(n),f) k 

The computation is somewhat similar to a free entropy dimension computation for single 
variables; see [27, Proposition 6.1] for example. 

(2) For the measure v\ (2 < A < oo) in (6.1), with the help of a table on integration 
formulas, we can compute 

*f"A:f ) " I T7 I ' \ ' TT I I + ' + 




2irJ *\2\ V A 2 J ) AVA 2 ^! . /TT^ : 



and hence we get 



as A — > oo. 



These examples tell us that the minus free entropy — S(/x) cannot be compared with the 
relative entropy 5(u, d6/2ir). 
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6.4. Scaling limit formulas for relative free entropy and relative free Fisher in- 
formation. It seems worthwhile to state some scaling limit formulas given in the proofs of 
the main theorems in separate propositions, saying that the relative entropy and the Fisher 
information of relevant random matrices asymptotically converge to the corresponding free 
analogs for limiting measures. In fact, the formulas for relative free entropy were essentially 
got in [13]. 

The proof of (3.5) gives (1) of the next proposition, while that of (3.6) does (2) because 
Lemma 1.3 shows that the derivative formula in Lemma 3.1 (ii) is still valid for any U G SU(n) 
when Q is a real-valued C 1 function on T. The unitary versions are similar. 

Proposition 6.1. (1) Let Q be a real-valued continuous function on T, and fj, G Ai(T). If 

Qfj,(C) '■= 2 j T log |C — rj\ d/j,(rj) is finite and continuous on T, then 

£q(m) = lim -V( A n U (^),A^ U (Q)) = lim -U(A£(Q„), A?(Q)) . 

(2) In addition, if \x has a continuous density dfi/dC, and both Q and are C 1 functions 
on T, then 



lim — ^ 

n^oo n 6 ./SU(n) 



Vlog 



lim 



1 



n— >oo n 



U(n) 



Vlog 



dXf^ [J (Q fl ) 



HS 



HS 



d\l(Q,){U). 



Similar limit formulas are given also in the real line case. The formula in (1) below is (4.7). 
The proof of (2) is more or less similar to the circle case; here the fact that Q'^{x) = 2(Hp)(x) 
for a.e. x G R is needed in place of Lemma 3.2 (i). The details are left to the reader. Note 
that the limits in both formulas are independent of the choice of R such that jjl is supported 
in [-R, R]. Although the assumption of being C 1 on R seems rather strong, yet we have 
many such examples (see [26, §IV.5]). 

Proposition 6.2. (1) Let Q be a real-valued continuous function on R satisfying (1.8), and 
fx G A4(R) be supported in [—R,R]. IfQ^x) := 2 J* R log|x — y\ d[x(x) is finite and continuous 
on R, then 

£q(m) = lim ^S(\ n (Q fl ;R),X n (Q)). 

n^oo n 

(2) In addition, if /i has a continuous density d/i/dx and both Q and are C 1 functions 
on R, then 



lim 



-l 



HS 



dX n {Q^R){A). 



6.5. Free LSI for measures on R + . The free LSI (2.2) is applicable in particular for 
measures supported in R + = [0, oo), but we can also show a different inequality which might 
be a proper free LSI in the case where the whole space is R + instead of R. Let 7V4 S (R) 
be the set of symmetric probability measures on R. Consider the bijective transformation 
fi G M(R + ) H/ie M S (R) defined as 



fi(F) = fi({x G R : x 2 G F}) for F C R + . 
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When n G A4(R + ) has the density p = dfx/dx on R + , the measure fl has the density 
p = djl/dx on R and 

p(x) = \x\p(x 2 ), x G R; 

p(y/x) 



p(x) 



\fx 



x G R 



Lemma 6.3. Lei f be a measurable function on R + and se£ /(x) := |x|/(x 2 ) for x G R. 
TTten / G L 3 (R, dx) i/ and on/y / G L 3 (R + ,xdx). if i/tis is i/te case, then the Hilbert 
transform (Hf)(x) exists for a.e. x G R + and (Hf)(x) = x(Hf)(x 2 ) for a.e. x G R. 

Proof. The first assertion is seen because J R |/(x)| 3 dx = J* R+ x|/(x)| 3 dx. Suppose / G 
L 3 (R + ,xdx); then (Hf)(x) exists for a.e. x G R. For every x > and < e < x 2 we 
compute 



+ 



X 2 +£ } X ~ t 



dt 



i | i ^ RVt) 



- (.r •/ 

\J0 J^Te)\ x + s X-s) 



2 +£ J\x + Vt X-y/tJ 2\/t 
/ 1 1 \ _ 

s) (is 



dt 



+ / 

oo i/ 2x—\ / x 2 —s . 



ds 



x — s 



Vx 2 +e x s 



J \/x 2 +£ 



ds. 



x — s 



The first of the last three terms is the principal value integral converging to (Hf)(x) as e \ 
for a.e. x > 0, while the second and the third terms converge to as e \ 0. Indeed, 



— \/x 2 ~ e 
Vx 2 +e 



x — s 



ds < 



1 



x + \/x 2 - e J-^+i 



-yJx 2 — E 



\f{s)\ds^K 



and 



< 



/(*) 



x — s 



ds 



\mfds 



1/3 / r 2x-Vx^ 



ds 



2/3 



'vx^ (s-x) 3 / 2 y 

= (I" I /(*) f ds ) ^ ( ( V^+I + x) 1/2 - (x + v / x 1 ^7) V2 ) 
— ► as e \ 0. 

Therefore, we see that (Hf)(x 2 ) exists and (Hf)(x) = x(Hf)(x 2 ) for a.e. x > 0. Moreover, 
we have (Hf)(x) = -(Hf)(-x) = x(Hf)(x 2 ) for a.e. x < as well. □ 

Let Q be a real- valued C 1 function on R + . For each fx G 7W(R + ) we define the "relative 
free Fisher information" 3>q(/x) to be 

$J( M ) := 4 x (W)(x) - iQ'(x)) d M (x) 
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when p has the density p = dp,/dx belonging to L 3 (R + , x dx); otherwise to be +00. In 
particular, the "free Fisher information" 3> + (/i) is defined as ®q(p) with Q = 0, i.e., 



$+(^=4/ x(Hp(x)) 2 dp(x). 



On the other hand, let Q be a real-valued continuous function on R + such that 

lim xexp(— eQ(x)) = for any e > 0. 

X— + +OO 

We define the "relative free entropy" Sg(/t) of p € M(R + ) as 

S+(/x) := -E( M ) + / Q(x) d^s) + B+(Q), 

where 



B+(Q) := lim 

n^oo n 



\ [ ■ ■ [ exp(-nV Q(x;) ) TT(x; - xj) 2 TT dx*. 

W V U Jt<j Li 



In fact, similarly to the real line case in §§1.4, the function £q(/x) on M(R + ) is the good 
rate function of the large deviation principle for the empirical eigenvalue distribution of the 
n x n positive random matrix 

dA+(Q)(A) := ^—exp(-nTr n (Q(A))) X{ A>o}(A)dA. 

Proposition 6.4. Let Q be a real-valued convex continuous function on R + such that Q is 
C 1 on (0, 00) and Q'(x) > p for all x > with a constant p > 0. Then, for every p G Ai(R + ) 
one has 

< (6.2) 

Proof. Define Q(x) := ^Q(x 2 ) for x G R; then it is easy to check that Q is a C 1 -function 
on R and Q(x) — fx 2 is convex on R. For each p G .M(R + ) we can apply Theorem 2.2 to 
p G A4 S (R) defined as above so that 

1 

2p 

Now, it suffices to show that 



$+(/*) = (6.3) 

E+(^) = 2S 5 (A). (6.4) 

To prove (6.3), we may assume that /U has the density p = dp/dx G L 3 (R + ,xdx). Letting 
p = dp/dx G L 3 (R, dx), we get by Lemma 6.3 

*%(ji) = 4 J^ + x(^Hp){x)-^Q'{x)^ p{x)dx 



\J({Hp){x) - \q'(x) \ p{x)dx = *Q{p). 



For every p G M(R + ) we have = 2S(/i) (see [17, p. 198]) and / R+ Q(x) d//(x) = 

2 f n Q(x)dp(x). For each 1/ G X(R), setting 1/' G M(R) by := F), we get 

q((^ ^0/2) < ^g( zy ) by the concavity of free entropy (see [17, p. 193]). These facts show 
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that the equilibrium measure \Iq associated with Q coincides with pq where pq is the unique 

minimizer of T,q(p). Therefore, we see that B + (Q) = 2B(Q) and = 2Sg(/x) for all 

fi£M(H + ). ' ' □ 

In particular, when Q(x) = px on R + with p > 0, note that pq = f° r the unique 
minimizer pq of Sq(/x), and the inequality (6.2) becomes 

-S(/i) + p/ xd/i(x) -logp- ^ < -($+(//) -2p + p 2 / d^)J, 



that is, 



1 15 
X (p) > --<S> + (p)-logp+-log2n+- 



as long as J* R+ xdp(x) < +oo. Maximizing the above right-hand side over p > gives 

which also follows from (2.4) combined with S(/u) = 2T,(p) and 3> + (/^) = &(p)- Notice that 
&(p) in (2.4) and <I> + (^) 2 in (6.5) are not comparable. For example, when p G .M(R + ) has 
a density p(x) = (a + l)x a X(o,i] ( x ) with a > —1/3, we compute 

, 4(q + 1) 3 . 4(a + l) 3 

3(3a + 1) 3(3a + 2) 

so that <& + {p) 2 /Q(p) converges to as a — ► —1/3 and also to +oo as a — > +oo. 

6.6. Free TCI for measures on R + . Consider the bijective transformation /x G .M(R + ) i— > 
/t G A^(R + ) defined as 

= p({x G R + : x 2 G F}) for FcR + . 
The next proposition is a free TCI when the whole space is R + . 

Proposition 6.5. Let Q be a real-valued function on R + . If Q(x 2 ) — px 2 is convex on R 
with a constant p > 0, then 

W(p,p Q )<^Jj- p ^(p) 

for every compactly supported p G Ai(R + ), where pq is the minimizer o/Sq(/x). 

Proof. Let Q(x) := \Q{x 2 ) for x G R as in the proof of Proposition 6.4. Since pq is the 
minimizer of £q(^) for v G M(R), Theorem 4.5 and (6.4) imply that 



w(prp Q )<^- p ^Q(p) = ] Jj- p ^(p). 

Hence, it remains to show that 

W(p,p Q ) < W(p,p Q ). 
To prove this, let tt G II(/2, pq) and define 

tt(G) := 7r({(x, y) G R x R : (\x\, \y\) G G}) 
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for Borel sets G C R + x R + . Then we get n G LT(/x, /xq) so that 

W(ft,fi Q )< [ -(x - y) 2 dit(x,y) = [ -(\x\ - \y\) 2 dir(x, y) 

JR+XR+ 1 JRxR 1 

< / \( x ~ y) 2 dK{x,y). 

JRxR z 

This implies the desired inequality. 

By replacing fi by /j,, the above free TCI can be rewritten as 

w(a*,Aq) 
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□ 



\og\x 2 -y 2 \dn(x)dn(y) + / Q(x 2 ) d//(x) + B+(Q) 

R+xR+ JR+ / 

for every compactly supported [i G A4(R + ). For example, when Q(x) = x on R + and p = 1, 
/tg is the quarter-semicircular distribution — a; 2 X[o,2] ^ an d B + (Q) = —3/2. 



Appendix A. Proof of Theorem 1.2 

In the following let us keep the relation Cn = (Ci " " " Cn-i) -1 - The proof below is essentially 
same as that in [16]. Set 

F((, V ) :=- log K-7/I + I(Q(C)+Q(^)). 

As in [16] it suffices to prove the following inequalities: 
(i) 



1 



lim sup log Z% v (Q) < - inf 



fi<=M(T) 



T 2 



F^d^Qd^). 



(ii) For every /i€M(T) 



inf 

G 



lim sup - j logA^Q) -(<5 Cl +- + i_ 1 +i)eG 



n 



< - ff F(C,r,)d^Odfx( V )-\immi^logZ^(Q), 

J J T 2 n->oo n z 



where G runs over all neighborhoods of \i. 
(hi) For every fi G M(T), 

hminf^log^ u (Q) >-/*/* F((,n)dv(Od»(ri). 

n->oo n z J J T 2 

(iv) For every /x G A4(T), 



inf 

G 



U ™ £ f ^ lo S ^(Q) - (*d + • • • + *c»-i + *<-) e G 
> - ff F(C^)^(C)^)-limsup^logZS u (Q), 
where G is as in (ii). 

The proofs of the first two are the same as in [16], so we omit them. To prove (iii) and (iv), 
we may assume (see [16]) that fx has a continuous density / > so that = f{e"^~^ e ^j dO/2-K 
and 5 < f(Q < S^ 1 on T for some 5 > 0. For each n G N choose 



o = < 4 n) < 4 n) < 4 n) < 4 n) < 



< <#> < = 2^ 
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d9 = -; 

n 



TT5 {n) (n) , VT VrJ ( n ) ( n ) 7T 

— < 6) ' - a) < — , — < a ) ' - b)^ < — 
n J J no n •> J no 



(A.l) 



for all 1 < j < n. Define 

A n := { (e^ , . . . , e ^T«--i }) : af < 9, < bf , 1 < j < n - 1} , 



9 n :={(»!,... , n _i) : a) nJ < % < &f \ 1 < j < n - l}, 
£< n) := m ax {Q(e^ Te ) : af } < 9 < bf ] ) for 1 < i < n - 1, 



( / : : ' : ": min{|< 



: a 



(n) 



< s < feS n) , aj n) < t < bf ] } for 1 < i, j < n - 1. 



For every neighborhood G of /i, if n is sufficiently large, then we have 
A w c((Ci,...,Cn-i)€T*- 1 : ^ + --- + ^ £G 



n 



so that with 9 



n = -un 



+ • • • + 0„-i) 



A« u (Q)<j + • • • + <5 Cn ) £G|> (Q)(A n ) 



n 



;su/ 



z^(Q)(2^ /"7e„ exp ( "E^K^")) 



- n 

l<i<j<n 
/ n-l 



> 



ZSU(Q)(2vr; 



n-l 



<i$i • • • d9 n —i 



(™)\2 



1=1 



l<i<j<n-l 



n-l 

e n 



_ p -/ = T(ei+-+en-l) 



d0\ • • • d9 n -i, 



where M := max{<3(C) : C G T}. Notice 

[9 1 + • • • + 9 n . x : . . . , n _i) G = £ aS n) , £ ^ 
and for n large enough 



-n— 1 n— 1 



i- j=l j=l 



n— 1 n— 1 



i=i i=i 

From (A.l) and (A. 2) we can choose an interval [a, (3} C Yli=i a T' ■> Y^i=\ b. 
(3 — a = ir5/n 2 and 



-1 „(«) ^n-1 ^(ra) 
i 



(A.2) 
such that 



[-/?,-<*] C 



6 ( n) + vr5 a ( n) _ tt6_ 

k-1 77,2 ' k n 2 



(mod 2tt) 
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for some 1 < k < n. Then, there exist subintervals C 
that 

r n— 1 n—l 

7T0 



(n) An) 



, 1 < i < n — 1 , such 



n z (n — 1) ^— ' ~ — 



i=i i=i 



and hence 



/III 

/"/3l fPn-l 

J a\ Ja n -i 



d6\ ■ ■ ■ dO n —\ 



d6\ ■ ■ ■ d9 n —i 



> 



n z J 



2(n-l) f ^ 



n 2 {n — 1) 

Therefore, for sufficiently large n, we get 



n-l 



X^(QU±.(8 (l + ... + 8 Cn )eG 



n 



> 



(2S 3 ) n - 1 



n-l 



ZSU(Q) n 7(n-l) 



-p -e^i n (4°)' 



= 1 / l<i<j<n-l 



Since 



lim ~2 ^ lo S d ! 



n— >oo 77, 



ij 



l<i<j<n—l 



1 



2tt /.2tt 



JO 



f(e*- ls )f e^- u )lo 



(2tt)2 

JJ log\C -r,\dtJL(Odti(v) 



e V-u _ e v-u 



ds dt 



as well as 



-* ?T" 1 -i /*27T /* 



we have 



1 



> lim sup — log X n (Q) - {S Cl + ■■■ + 6 Cn ) G G 
> - ff F(C, rt) dAi(C) dA*(»7) " I™ inf ^ log ^ U (Q) 



T 2 



n^oo 77 z 



and 



f 



hminf -^logA^Q^ + ■ • • + 8 Cn ) G G 



> 



77 



- // ^(C^dMO^W-limsup^logZSU^). 

J J T 2 rt-+oo W 



These imply (hi) and (iv). 



□ 
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